Search CORE

64 research outputs found

KL-Divergence Guided Temperature Sampling

Author: Aksitov Renat
Chang Chung-Ching
Reitter David
Sung Yun-Hsuan
Publication venue
Publication date: 02/06/2023
Field of study

Temperature sampling is a conventional approach to diversify large language model predictions. As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual. One common approach to mitigate hallucinations is to provide source/grounding documents and the model is trained to produce predictions that bind to and are attributable to the provided source. It appears that there is a trade-off between diversity and attribution. To mitigate any such trade-off, we propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source through KL-divergence. Our experiments justifies the trade-off, and shows that our sampling algorithm outperforms the conventional top-k and top-p algorithms in conversational question-answering and summarization tasks

arXiv.org e-Print Archive

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Author: Ainslie Joshua
Guo Mandy
Ni Jianmo
Ontanon Santiago
Sung Yun-Hsuan
Uthus David
Yang Yinfei
Publication venue
Publication date: 03/05/2022
Field of study

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.Comment: Accepted in NAACL 202

arXiv.org e-Print Archive

Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts

Author: Cer Daniel
Dong Zhe
Nham John
Ni Jianmo
Santos Cicero Nogueira dos
Shakeri Siamak
Sung Yun-hsuan
Publication venue
Publication date: 10/10/2022
Field of study

Soft prompts have been recently proposed as a tool for adapting large frozen language models (LMs) to new tasks. In this work, we repurpose soft prompts to the task of injecting world knowledge into LMs. We introduce a method to train soft prompts via self-supervised learning on data from knowledge bases. The resulting soft knowledge prompts (KPs) are task independent and work as an external memory of the LMs. We perform qualitative and quantitative experiments and demonstrate that: (1) KPs can effectively model the structure of the training data; (2) KPs can be used to improve the performance of LMs in different knowledge intensive tasks

arXiv.org e-Print Archive